Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap Sampling

نویسندگان

  • Dharmendra S Panwar
  • Kshitij Pathak
چکیده

Although publicly accessible databases containing speech documents. It requires a great deal of time and effort required to keep them up to date is often burdensome. In an effort to help identify speaker of speech if text is available, text-mining tools, from the machine learning discipline, it can be applied to help in this process also. Here, we describe and evaluate document classification algorithms i.e. a combo pack of text mining and classification. This task asked participants to design classifiers for identifying documents containing speech related information in the main literature, and evaluated them against one another. Expected systems utilizes a novel approach of k -nearest neighbour classification and compare its performance by taking different values of k. Keywords— Data Mining, Text Mining, Classification, K-nearest neighbour, KNN

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weighted K-Nearest Neighbor Classification Algorithm Based on Genetic Algorithm

K-Nearest Neighbor (KNN) is one of the most popular algorithms for data classification. Many researchers have found that the KNN algorithm accomplishes very good performance in their experiments on different datasets. The traditional KNN text classification algorithm has limitations: calculation complexity, the performance is solely dependent on the training set, and so on. To overcome these li...

متن کامل

Impact of the Sakoe-Chiba Band on the DTW Time Series Distance Measure for kNN Classification

For classification of time series, the simple 1-nearest neighbor (1NN) classifier in combination with an elastic distance measure such as Dynamic Time Warping (DTW) distance is considered superior in terms of classification accuracy to many other more elaborate methods, including k-nearest neighbor (kNN) with neighborhood size k > 1. In this paper we revisit this apparently peculiar relationshi...

متن کامل

A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection

K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...

متن کامل

ارائه روشی برای استخراج کلمات کلیدی و وزن‌دهی کلمات برای بهبود طبقه‌بندی متون فارسی

Due to ever-increasing information expansion and existing huge amount of unstructured documents, usage of keywords plays a very important role in information retrieval. Because of a manually-extraction of keywords faces various challenges, their automated extraction seems inevitable. In this research, it has been tried to use a thesaurus, (a structured word-net) to automatically extract them. A...

متن کامل

A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure

Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014